Web Mining Accelerated with In-Memory and Column Store Technology
نویسندگان
چکیده
Current web mining approaches use massive amounts of commodity hardware and processing time to leverage analytics for today’s web. For a seamless application interaction, those approaches have to use pre-aggregated results and indexes to circumvent the slow processing on their data stores e.g. relational databases or document stores. The upcoming trend of in-memory, column-oriented databases is widely used to accelerate business analytics like financial reports, but the application on large text corpora remains unaffected. We argue that although in-memory, column-oriented stores are tailor-made for traditional data schemes, they are also applicable for web mining applications that mainly consists of raw text informations enriched with limited semantic meta data. Thus, we implement a web mining application that stores every information in a pure main memory data store. We experience an acceleration of current web mining queries and identify new opportunities for web mining applications. To evaluate the performance impact, we compare the run-time of general web mining tasks on a traditional row-oriented, disc-based database and a column-oriented, in-memory database using the example of BlogIntelligence, which serves exemplary for web mining applications.
منابع مشابه
Customer lifetime value model in an online toy store
Business all around the world uses different approaches to know their customers, segment them and formulate suitable strategies for them. One of these approaches is calculating the value of each customer for the company. In this paper by calculating Customer Lifetime Value (CLV) for individual customers of an online toy store named Alakdolak, three customer segments are extracted. The level of ...
متن کاملInteractional effects of bubble size, particle size, and collector dosage on bubble loading in column flotation
The success of flotation operation depends upon the thriving interactions of chemical and physical variables. In this work, the effects of particle size, bubble size, and collector dosage on the bubble loading in a continuous flotation column were investigated. In other words, this work was mainly concerned with the evaluation of the true flotation response to the changes in the operating varia...
متن کاملAutomatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملThe Effect of the Slot Length on Beam Vertical Shear in I-Beams with Moment Connections
This paper evaluates the effect of slot existence with limited length between flanges and web junction of I-shaped beams at the region of moment connections on vertical force and shear stress distribution in beam flanges and web at connection section in comparison with classical theory of stress distribution. The main purpose of this research is to evaluate the efficiency of the slot in connect...
متن کاملA Cost-Aware Strategy for Merging Differential Stores in Column-Oriented In-Memory DBMS
Fast execution of analytical and transactional queries in column-oriented in-memory DBMS is achieved by combining a readoptimized data store with a write-optimized differential store. To maintain high read performance, both structures must be merged from time to time. In this paper we describe a new merge algorithm that applies full and partial merge operations based on their costs and improvem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013